16 research outputs found

    Comparing Task Simplifications to Learn Closed-Loop Object Picking Using Deep Reinforcement Learning

    Full text link
    Enabling autonomous robots to interact in unstructured environments with dynamic objects requires manipulation capabilities that can deal with clutter, changes, and objects' variability. This paper presents a comparison of different reinforcement learning-based approaches for object picking with a robotic manipulator. We learn closed-loop policies mapping depth camera inputs to motion commands and compare different approaches to keep the problem tractable, including reward shaping, curriculum learning and using a policy pre-trained on a task with a reduced action set to warm-start the full problem. For efficient and more flexible data collection, we train in simulation and transfer the policies to a real robot. We show that using curriculum learning, policies learned with a sparse reward formulation can be trained at similar rates as with a shaped reward. These policies result in success rates comparable to the policy initialized on the simplified task. We could successfully transfer these policies to the real robot with only minor modifications of the depth image filtering. We found that using a heuristic to warm-start the training was useful to enforce desired behavior, while the policies trained from scratch using a curriculum learned better to cope with unseen scenarios where objects are removed.Comment: 8 pages, video available at https://youtu.be/ii16Zejmf-

    Incremental Object Database: Building 3D Models from Multiple Partial Observations

    Full text link
    Collecting 3D object datasets involves a large amount of manual work and is time consuming. Getting complete models of objects either requires a 3D scanner that covers all the surfaces of an object or one needs to rotate it to completely observe it. We present a system that incrementally builds a database of objects as a mobile agent traverses a scene. Our approach requires no prior knowledge of the shapes present in the scene. Object-like segments are extracted from a global segmentation map, which is built online using the input of segmented RGB-D images. These segments are stored in a database, matched among each other, and merged with other previously observed instances. This allows us to create and improve object models on the fly and to use these merged models to reconstruct also unobserved parts of the scene. The database contains each (potentially merged) object model only once, together with a set of poses where it was observed. We evaluate our pipeline with one public dataset, and on a newly created Google Tango dataset containing four indoor scenes with some of the objects appearing multiple times, both within and across scenes

    Object Modeling and Interactive Perception for Robot Manipulation

    No full text
    A key challenge in robotic systems is how to interpret all the data coming from the sensors on-board the system, and how to select the right environment representation that is beneficial for the task at hand. These challenges are particularly difficult for robots that operate in cluttered environments since objects are ambiguous and tough to distinguish in clutter. Moreover, if a robot needs to physically interact with objects, its internal representation needs to be informative enough to perform the intended interaction, and flexible enough to handle the consequences of its actions. One common application that captures these challenges is object finding in clutter, which in addition to passive visual inspection, often requires interaction to reveal the hidden parts of the environment. To interact, robots need to have some knowledge about the target object, e.g. an object model, and need an environment representation that, together with proprioceptive sensing, contains sufficient information to complete a task successfully. This thesis aims to develop an interactive perception framework for robots. Towards this aim the focus is on (i) how to generate and update object models and (ii) how to develop a flexible planning framework for object interaction. The first part of this thesis presents a dataset of household objects and box scenes containing those objects (CLUBS), commonly found in warehouses. The process of data collection with multiple RGB-D cameras mounted on a robotic arm, and accurate object model generation is described. An image annotation pipeline, and object 3D bounding box estimation approach is also proposed. The box scenes within the dataset contain up to 40 household objects in different configurations, and with different levels of clutter. The variability of the provided data is beneficial for evaluating tasks such as object detection, segmentation, reconstruction, and object completion. Since these scenes also contain 2D object bounding boxes and per frame pixel-wise labels, and 3D object bounding boxes in a world frame, they are valuable for the training of learning-based algorithms. The data, as well as all the tools to manipulate it, are made publicly available. The second part of this thesis proposes a system for 3D perception with a twofold objective. The first objective is to provide object detections in cluttered environments for the target objects that are given in the database. The second objective is to incrementally build a database of object models using an RGB-D camera with or without any prior knowledge about the objects in the environment. The system uses a Truncated Signed Distance Function (TSDF) volumetric representation for storing geometry and object instance information obtained from a depth instance segmentation algorithm. The evaluation on different datasets, including the CLUBS dataset, shows that the system is able to detect objects in different levels of clutter, and generate appealing object models. Even though the object models generated with the incremental database are less accurate and less complete than the ones obtained from the dataset, they are still a faithful representation of the actual object and are valuable for further reasoning about the scene. In the last part of the thesis, a Reinforcement Learning (RL) approach is proposed for active and interactive perception-based object search in clutter. If the object detector does not find the target object in a scene even after the camera moves, it might imply that the object is occluded by the other parts of the environment, or not present at all. To address such scenarios, the robot needs to be able to interact with the environment and remove some of the occlusions, therefore a system capable of planning such actions is proposed. An experimental evaluation of the framework on both a simulated and a real world robotic system shows that the proposed system, trained in simulation only, can solve the task efficiently and with a high success rate. Finally, the performance is compared to baseline approaches showing that the proposed method is more time efficient with comparable success rates. The two main components of this thesis addressed the issues of 3D perception in clutter and action selection for object finding. The results have shown that with such a system the robot is able to discover new objects in the environment and complete the ones already in the database. This can be done online and without any prior knowledge, making the approach very general and flexible. In addition to just moving the camera, this thesis proposed an approach that enables robots to plan actions to physically interact with the to environment with the goal of revealing the hidden parts. By deploying such a framework on a real robot, it allows the robot to understand the environment on an object-level and based on such a representation, plan meaningful actions for a given task, e.g. object finding in clutter
    corecore